3 Norms and Condition Number
3.1 Vector Norms
A norm on a vector space \(V\) defines a function that measures the “length” of each vector. There are different kinds of norms, but we will ask that they all satisfy certain properties one would expect of a length-measuring function. We need to start with a norm on the field of scalars.
Definition 3.1 We say that a field \(K\) has a norm, or absolute value, if there is a function \(|\cdot|:K\to \mathbb R\) such that for all \(a,b\in K\), we have
- \(|a|\geq 0\), and \(|a| = 0\) if and only if \(a=0\),
- \(|ab| = |a||b|\),
- \(|a+b| \leq |a| + |b|\).
The third condition is called the triangle inequality.
A typical example is the standard absolute value on \(\mathbb R\) or \(\mathbb C\). Once the scalars have a norm, we define vector norms in a similar fashion.
Definition 3.2 Let \(V\) be a vector space over a field \(K\) with an absolute value \(|\cdot |\) (as usual, you can think of \(K\) as \(\mathbb Q\), \(\mathbb R\), or \(\mathbb C\) and its usual absolute value). A norm on \(V\) is a function \(|\cdot|: V \to \mathbb R\) such that for any \(u,v\in V\) and \(c\in K\), we have
- \(|v|\geq 0\), and \(|v| = 0\) if and only if \(v=0\),
- \(|cv| = |c||v|\),
- \(|u+v| \leq |u| + |v|\).
(we use the same symbol for the field and vector norms)
Example 3.1 Suppose \(V\) is equipped with an inner product \(\cdot\). Then, the function \(V \to \mathbb R\) given by \(v \mapsto v\cdot v\) defines a norm. Therefore, every inner product space is a normed vector space.
Example 3.2 Let \(p\) be a positive integer. The \(L_p\) norm on \(\mathbb R^n\) is defined by \[\|(x_1, x_2, \dots, x_n)\|_p = \left(\sum_{i=1}^{n} |x_i|^p\right)^{\frac{1}{p}}\]
The case \(p=2\) is the familiar Euclidean norm. The triangle inequality for most \(L_p\) norms is difficult to verify (often attributed to Minkowski and Hölder).
Example 3.3 The \(L_\infty\) norm, sometimes called the sup norm, on \(\mathbb R^n\) is defined by \[\|(x_1, x_2, \dots, x_n)\|_\infty = \max\{|x_1|,...,|x_n|\}\]
The \(L_\infty\) and \(L_p\) norms have similar notation because there is a sense in which the \(L_p\) norms converge to the \(L_\infty\) norm. Intuitively, taking large \(p\) amplifies the dominance of large absolute value in the sum, and then the \(p\)th root gives back just that value.
3.2 Limits in Vector Spaces
A norm allows us to discuss the notion of distance. For example, the distance between real numbers \(x\) and \(y\) is \(|x-y|\). This construction works just as well for normed vector space. Once we have a notion of distance, we can talk about vectors getting close to other vectors, which leads us to limits.
In fact, the notion of convergence for a sequence of vectors is identical to that of a sequence of real numbers:
Definition 3.3 Let \(V\) be a normed vector space and \((v_n)_{n\in\mathbb N}\) be a sequence of vectors \(v_n\in V\). Another vector \(v\in V\) is said to be the limit of this sequence if, for all \(\epsilon > 0\) there exists an index \(N\) such that for all \(n> N\) \[|v_n - v| < \epsilon.\]
Limits of functions can also be generalized to this setting. If \(f:V\to W\) is a map between normed vector spaces, we say that \(f(x) \to w\) as \(x\to v\) if, for all \(\epsilon > 0\) there exists \(\delta\) such that \(|f(x) - w| < \epsilon\) whenevr \(|x-v| < \delta\). We say that \(f\) is continuous at \(v\) if \(f(v) = \lim_{x\to v} f(x)\).
The limit of a sequence appears to depends on the choice of vector norm. In Example 3.2, we saw that there are many different norms on \(\mathbb R^n\) all compatible with the usual absolute value on \(\mathbb R\). It is natural to wonder, then, how these limits might differ. Remarkably, it turns out that every norm on \(\mathbb R^n\) gives rise to the same notion of limit when \(n\) is finite.
To start, we’ll establish a sufficient criterion for two norms to agree on all limits.
Theorem 3.1 Let \(|\cdot|_a\) and \(|\cdot|_b\) be two norms on a vector space \(V\). The limits of all sequences vectors with respect to \(|\cdot|_a\) and \(|\cdot|_b\) coincide if and there are two non-zero positive constants \(C\) and \(D\) such that for all \(v\in V\) \[ C|v|_a \leq |v|_b \leq D|v|_a \]
Proof. Let \((v_n)\) be an arbitrary sequence of vectors in \(V\). Take \(C,D\) as in the statement of the theorem.
Now take a sequence \((v_n)\) of vectors in \(v\), and suppose \(v_n \to v\) with respect to \(|\cdot|_a\). By definition, this means that for all \(\epsilon' > 0\) there is an \(N\) such that \(n > N\) implies \(|v_n - v|_a < \epsilon'\).
We aim to show that \(v_n\to v\) with respect to \(|\cdot|_b\). So, let \(\epsilon > 0\). Since \(\epsilon'\) is just an arbitrary number, we may set \(\epsilon'=\epsilon/D\). With this change, we have \[|v_n - v|_a < \epsilon/D.\] Clearing the denominator and applying the relationship between the norms, we obtain \[\epsilon > D|v_n-v|_b \geq |v_n-v|_b\] for all \(n> N\). Thus, \(v_n\to v\) with respect to \(|\cdot|_b\).
A symmetric argument using \(C\) rather than \(D\) shows that \(v_n\to v\) with respect to \(|\cdot|_b\) implies \(v_n\to v\) with respect to \(|\cdot |_a\).
Conversely, assume that for all sequences the \(|\cdot|_a\) and \(|\cdot|_b\) limits agree. We will show that there are some \(C,D\) such that \(C|v|_a \leq |v|_b \leq D|v|_a\) for all \(v\in V\). It will suffice to find \(D\); a similar argument yields \(C\).
By the scaling property of norms, it suffices to show that \(|v|_b \leq D|v|_a\) only for those \(v\) such that \(|v|_a = |v|_a = 1\), or in other words there is some \(D\) such that \(|v|_a = 1\) implies \(|v|_b \leq D\). If there were no such \(D\), then there must be \(v \in V\) with \(|v|_a = 1\) but \(|v|_b\) arbitrarily large. In particular, for all \(n\in \mathbb N\) we can find a \(v_n'\) such that
- \(|v_n'|_a = 1\)
- \(|v_n'|_b = n^2\)
from which we can form the sequence of \(v_n = \frac 1 n v_n'\).
By construction, \(|v_n|_a = 1/n \to 0\), so \(v_n \to 0\) with respect to \(|\cdot|_a\) and in particular converges. But \(|v_n'|_b = n \to \infty\), so it cannot converge for \(|\cdot |_b\), contrary to assumption.
Remark 3.1. The unit disk in \(\mathbb R^2\) is the set of vectors \((x,y)\) of norm at most \(1\) – the unit circle, in other words. In higher dimension, and more general normed vector spaces, we can still define this kind of set. It is called the unit ball: the set of \(v\in V\) such that \(|v|\leq 1\). The unit sphere is the set of \(v\in V\) with radius exactly \(1\). See the norm demo to visualize various unit \(L_p\)-balls/spheres in \(\mathbb R^2\).
One can translate the theorem above into a statement about the geometry of the unit balls for each norm: \(|\cdot|_a\) and \(|\cdot|_b\) are equivalent if it is possible to scale the unit ball for \(|\cdot|_a\) to fit inside the unit ball for \(|\cdot|_b\) and vice-versa.
For any vector \(v\in V\), the set of vectors \(x\in V\) such that \(|v-x| < \epsilon\) is exactly the unit ball scaled by \(\epsilon\) and translated to be centered at \(v\), so if the unit balls are comparable in size, then so is the notion of \(\epsilon\)-closeness.
We say that norms are equivalent if they satisfy the hypotheses of Theorem 3.1; in light of that theorem, norms are equivalent precisely when they agree on all limits.
As indicated above, it turns out that norms on finite-dimensional vector spaces all coincide. Our definitions of the \(L_p\) norms implicitly depend on a basis, and it is useful to know that the choice of basis doesn’t matter. But this fact is even stronger: we could pick an entirely different norm, whichever is most convenient for the problem at hand. The \(L_2\) norm is familiar, but relatively complicated compared to the \(L_1\) and \(L_\infty\) norms. Many of our proofs will use one of the latter.
The following fact from
Theorem 3.2 All norms on \(\mathbb R^n\) are equivalent to the \(L_\infty\) norm.
Proof. Abbreviate \(V = \mathbb R^n\) with standard ordered basis \((e_1,...,e_n)\). Given \(v\in V\), let \(v_i \in \mathbb R\) be its coordinates with respect to that basis, meaning \[v = \sum_{i=1}^n v_i e_i.\]
Let \(|\cdot|: V\to \mathbb R\) be a norm. We seek \(C,D\) such that $$ C|v||v| D|v|.$
As in the proof of Theorem 3.1, we may assume that \(|v|_\infty = 1\) by the scaling property of norms. This reduces us to showing that there are \(C,D\) such that \(C\leq |v| \leq |D|\) whenever \(|v|_\infty = 1\). Note that \(|v|_\infty\) is the maximum of the coordinates, so \(|v|_\infty = 1\) means that \(|v_i|\leq 1\) for all \(i\), and there is at least one \(k\) such that \(|v_k| = 1\).
The upper bound is straightforward. Let \(M = \max\{|e_1|,...,|e_n|\}\).
\[\begin{align*} |v| &= \left| \sum_{i=1}^n v_ie_i\right|,\\ &\leq \sum_{i=1}^n |v_i||e_i|,\\ &\leq \sum_{i=1}^n M,\\ &\leq nM, \end{align*}\]
so \(D=nM\) will do the job.
The lower bound requires some topological/analytic facts which are slightly beyond the scope of this course. There are two main ingredients: the norm \(|\cdot|\) viewed as a function from \(V\) to \(\mathbb R\) is continuous for \(|\cdot|_\infty\), and that the extreme value theorem applies to the \(|\cdot|_\infty\)-sphere, so \(|\cdot|\) achieves a minimum on it. That minimum will be \(C\). It must be nonzero, because every vector on the sphere is nonzero and \(|\cdot|\) is a norm.
Corollary 3.1 If a sequence of vectors in \(\mathbb R^n\) is unbounded for one norm, it is unbounded for every norm.
The unproven facts used in Theorem 3.2 will be useful again, so we record them in a lemma.
Lemma 3.1 Let \(V\) be a vector space with norm \(|\cdot|\). If \(|\cdot|'\) is any other norm, then as a function from \(V\) to \(\mathbb R\), it is continuous with respect to \(|\cdot|\). In other words, for all \(v\in V\) and for all \(\epsilon > 0\), there is a \(\delta > 0\) such that \[|v-x| < \delta\] implies \[|v-x|' < \epsilon.\]
Additionally, the image of a sphere under a continuous map is bounded.
3.3 Matrix Norms
Next, we’d like to extend the notion of norms from vector spaces to linear maps. The approach is what you might expect: we compare lengths of vectors before and after applying the map.
Definition 3.4 Let \(V\) and \(W\) be finite-dimensional vector spaces with norms \(|\cdot|_V\) and \(|\cdot|_W\), respectively. Given a linear map \(T\), the induced matrix norm \(||T||\) is given by the supremum \[||T|| = \sup \{|T(v)|_W/|v|_V\ :\ v\in V, v\neq 0\}.\]
Of course, for the supremum to be defined, we need to know that the set in question is bounded.
Proposition 3.1 The set \[\{|T(v)|_W/|v|_V\ :\ v\in V, v\neq 0\}\] is bounded.
Proof. Note that by scaling the set \[\{|T(v)|_W/|v|_V\ :\ v\in V, v\neq 0\}\] is the same as the set \[\{|T(v)|_W/|v|_V\ :\ v\in V, |v|_V=1\},\] better known as \[\{|T(v)|_W\ :\ v\in V, |v|_V=1\}.\]
This is the image of a sphere under the map \(v\mapsto |T(v)|_W\), so by Lemma 3.1, it suffices to show that this map is continuous. The norm on \(W\) is continuous, so we only need to check that all linear maps \(T\) are continuous. But this is straightforward: we want to show that \(|T(v)-T(x)|_W\) is small when \(|v-x|_V\) is small. By linearity, \(|T(v)-T(x)|_W = |T(v-x)|_W\), so we may as well set \(y=v-x\) and show that \(|T(y)|_W\) is small when \(|y|_V\) is small. By Theorem 3.2 we may take ordered bases \((e_1,...,e_m)\) and \((f_1,...,f_m\) for \(V\) and \(W\), and assume that \(|\cdot|_V\) is the \(L_1\) norm and \(|\cdot|_W\) is the \(L_\infty\) norm.
Now, let \(\epsilon > 0\). Our target is \(|T(y)|_W < \epsilon\). Let \(T\) be represented by a matrix \((t_{ij})\) with respect to these bases, and set \(M= \max \{t_{ij}\}\). Write \(y\) in terms of the basis as \(\sum a_ie_i\). Then by definition, \[T(y) = \sum_j\left(\sum_i t_{ji} a_i\right)f_j,\] and therefore \[|T(y)|_W = \max_j\{\left|\sum_i t_{ji} a_i\right|\}.\]
We can bound the term on the right, since \[\left|\sum_i t_{ji} a_i\right|\leq \sum _i |t_{ji}| |a_i| \leq \sum_i M|a_i| = M |y|_1.\]
Therefore, if \(|y|_V < \delta = \epsilon/M\), we will have \(|T(y)|_W < \epsilon\).
We call this a matrix norm because it has all the same properties as the norms we’ve considered before:
Theorem 3.3 Given normed vector spaces \(V\) and \(W\), the induced matrix norm \(||\cdot||\) of Definition 3.4 has the following properties:
- \(||T|| \geq 0\), with equality if and only if \(T\) is the zero map,
- \(||cT|| = |c| ||T||\) for scalars \(c\in K\),
- \(||S+T|| \leq ||S||+||T||\)
(sums and scalar multiples of linear maps are still linear maps; in fact we can view the set of linear maps from \(V\to W\) as a vector space in its own right, and this says that the induced matrix norm is a vector norm on this space)
Proof. Well, by definition \(||T||\) is the supremum of a set of nonnegative numbers, so it’s nonnegative. Moreover, if \(T\neq 0\) then there must be some \(v\in V\) such that \(T(v)\neq 0\), and hence \(|T(v)|_W >0\). Of course, if \(T(v)\neq 0\) then, \(v\neq 0\), so \(|v|_V\neq 0\) as well, and hence \(|T(v)|_W/|v|_V > 0\) appears in the set defining \(||T||\), so \(||T|| \geq |T(v)|_W/|v|_V > 0\).
Given \(c\in K\), scaling for \(|\cdot|_W\) yields \[\frac{|cT(v)|_W}{|v|_V} = |c|\frac{|T(v)|_W}{|v|_V}\] so the set defining \(||cT||\) can be obtained by scaling that for \(||T||\) by a factor of \(|c|\), hence their suprema are related by a factor of \(|c|\) too.
We leave the triangle inequality as an exercise (see HW2 - Written).
Having define norms for vectors and linear maps, the natural question to consider next is how they are related; knowing \(||T||\) and \(|v|\), what can be said about \(|T(v)|\)? One might hope that matrix multiplication is just like scalar multiplication, so \(|T(v)| = ||T|||v|\). This turns out to be too much to ask, but we get something similar, called submultiplicativity.
Theorem 3.4 Let \(T:V\to W\) be a linear map between normed vector spaces. Then for all \(v\in V\), \[|T(v)| \leq ||T|| |v|.\]
Proof. There is nothing to prove if \(v=0\).
By definition,
\[||T|| = \sup \{|T(x)|_W/|x|_V\ :\ x\in V, x\neq 0\},\]
If we take \(x=v\), then it follows \[||T|| \geq |T(v)|_W/|v|_V,\] and hence \[|T(v)|_W \leq ||T|| |v|_V.\]
What about matrix norms under composition? For simplicity, we only treat the case \(S,T:V\to V\) with the same norm on both sides.
Theorem 3.5 Let \(V\) be a normed vector space and \(S,T:V\to V\) linear maps. Then \[||S\circ T|| \leq ||S|| ||T||.\]
Proof. Let \(v\in V\) with \(|v|=1\). Applying Theorem 3.4 twice, we obtain: \[|S\circ T(v)| \leq ||S|| ||T(v)|| \leq ||S||||T|| |v| = ||S||||T||\]
Therefore, the set defining \(||S\circ T||\) is bounded by \(||S||||T||\), so its supremum is bounded by the same quantity.
Corollary 3.2 Suppose \(T:V\to V\) is a linear endomorphism of a normed vector space and \(||T|| < 1\). Then for all \(v\in V\), \[T^n(v) \to 0\] and in fact \[T^n \to 0.\]
Proof. By induction on Theorem 3.4, we obtain \[|T^n(v)| \leq ||T||^n |v|.\]
When \(||T|| < 1\), the right hand side goes to zero, and if \(|T^n(v)| \to 0\) then \(T^n(v) \to 0\) by Exercise 3.6. Likewise, \(||T^n||\leq ||T||^n\) also goes to zero, so \(T^n \to 0\).
3.4 Notable Matrix Norms
A few induced matrix norms can be made explicit, and they involve our favorite vector norms, \(L_1\), \(L_2\), and \(L_\infty\). We will use the following quite often.
Theorem 3.6 Let \(V=\mathbb R^n\) with some chosen basis and \(T:V\to V\) a linear transformation represented by the matrix \(A = [A_{ij}]\). The induced norm \(||T||_1\) from the vector \(L_1\) norm satisfies \[ ||T||_{1} =\max_{j}\sum_{i=1}^m|A_{ij}|. \] In other words, the \(L_1\) matrix norm is the largest of the absolute column sums of \(A\); the largest of the \(L_1\) norms of the columns.
Proof. Notice that
\[ ||Av||_1 = \sum_{i = 1}^{n}\left|\sum_{k = 1}^{n} A_{ik}v_k\right| \le \sum_{i = 1}^{n}\sum_{k = 1}^{n} |A_{ik}||v_k| \le \sum_{k=1}^n |v_k|\sum_{i=1}^n|A_{ik}| \le \left(\max_{k}\sum_{i=1}^m|A_{ik}|\right)|v|_1. \]
This shows that \[ ||A||_{1} = \sup\{ ||Av||_1 \ :\ v\in \mathbb{R}^n, |v|_1 = 1|\} \leq \max_{k}\sum_{i=1}^m|A_{ik}| \]
To show the reversed inequality, we only need to show that \[\left(\max_{k}\sum_{i=1}^m|A_{ik}|\right)\] is actually attained as a value in the set. Simply let \(v\) be the \(k_0\)-th standard basis element, where \(k_0\) is defined as the value of \(k\) achieving the maximum of \[\left(\sum_{i=1}^m|A_{ik}|\right).\]
There is a companion result for \(L_\infty\) norms.
Theorem 3.7 As in Theorem 3.6, let \(A\in\mathbb{R}^{n\times n}\) represent the linear map \(T\). Then the matrix norm induced by the vector \(L_\infty\) norm is given by the maximum absolute row sum of \(A\); the maximum of the \(L_1\) norms of its rows.
Proof. Calculating directly,
\[\begin{align*} ||A_\infty|| &= \sup\{ ||Av||_\infty: v\in \mathbb{R}^n, |v|_\infty = 1\}\\ &= \sup_{ |v|_\infty = 1}( \max_{1\le i\le n}|(Av)_i|)\\ &= \max_{1\le i\le n}(\sup_{ |v|_\infty = 1}|(Av)_i|)\\ &= \max_{1\le i\le n}\left(\sup_{ |v|_\infty = 1}\left|\sum_{k = 1}^n A_{ik}v_k\right|\right)\\ \end{align*}\]
Now, to maximize \(\displaystyle \left|\sum_{k = 1}^n A_{ik}v_k\right|\) we only need to maximize \(v_k\) when \(A_{ik}\geq 0\) and minimize \(v_k\) when \(A_{ik}\leq 0\). However, remember that the absolute value of \(v_k\) is bounded by 1. Therefore, we choose
\[ v_i = \begin{cases} 1 &\text{if } A_{ik}\geq 0,\\ -1 &\text{if } A_{ik}\leq 0.\\ \end{cases} \]
Then,
\[ ||A_\infty|| = \max_{1\le i\le n}\left(\sup_{ |v|_\infty = 1}\left|\sum_{k = 1}^n A_{ik}v_k\right|\right) = \max_{1\le i\le n}\left(\sum_{k = 1}^n A_{ik}\right). \]
Remark 3.2. This is still true for complex vector spaces. Instead of using \(\pm 1\) to achieve the maximum, one uses the angular component (argument) of the entry to put them all on the positive real axis.
The calculation for the \(L_2\) norm is quite interesting. Though we won’t have much occasion to use it in this course, you will see singular values again in the future.
Theorem 3.8 Let \(V=\mathbb R^n\) with some chosen basis and \(T:V\to V\) a linear transformation represented by the matrix \(A = [A_{ij}]\). The \(L_2\) induced matrix norm satisfies
\[||A||_2 = \sqrt{|\rho(A^TA)|}\]
where \(\rho(A^TA)\) is defined as the largest eigenvalue of \(A^TA\).
Remark 3.3. Square roots of eigenvalues of \(A^TA\) are called singular values of \(A\), so we actually have that \(||A||_2\) is equal to the largest singular value of \(A\).
Proof. Let \(f(v)\) be the function given by
\[ f(v) = ||Av||_2^2 = (Av)^TAv = v^TA^TAv \]
We want to find its maximum subject to the constraint \(|v|_2 = 1\). It’s easy to see that the matrix \(A^TA\) is symmetric, so by a classical result of linear algebra, \(A^TA\) has nonnegative eigenvalues and can be diagonalized by an orthogonal matri. Let \(\{\alpha_1, \dots, \alpha_n\}\) be the orthonormal eigenbasis, and let \(\{\lambda_1, \dots, \lambda_n\}\) be the corresponding eigenvalues. Take an \(v\in \mathbb{R}^n\), represented as
\[ v = c_1\alpha_1 + \cdots + c_n\alpha_n = \sum_{i = 1}^n c_i\alpha_i \]
Plug this into \(f\), we have
\[\begin{align*} f(v) = f\left(\sum_{i = 1}^n c_i\alpha_i \right) &= \left( \sum_{i = 1}^n c_i\alpha_i^T \right)A^TA\left( \sum_{j = 1}^n c_j\alpha_j \right)\\ &= \sum_{i = 1}^n \sum_{j= 1}^n c_ic_j \left(\alpha_i^TA^TA\alpha_j \right) \end{align*}\]
Since the eigenbasis is orthonormal, we know \(\alpha_i\cdot \alpha_j\) is \(1\) if \(i=j\) and zer otherwise. Therefore, \[ \alpha_i^TA^TA\alpha_j = \begin{cases} \lambda_i & \text{if } i = j\\ 0& \text{if } i\neq j \end{cases} \]
Therefore, \[ f(v) = \sum_{i = 1}^n c_i^2 \lambda_i \]
Recall that \(|v|_2 = 1\) and \(c_i\) is the coordinate of \(v\) on an orthonormal basis, so we have \(\displaystyle \sum_{i = 1}^n c_i^2 = 1\). Now, suppose \(\lambda_m = \rho(A^TA)\) is the maximum eigenvalue. It follows that
\[ f(v) \leq \sum_{i = 1}^n c_i^2\lambda_m = \lambda_m \]
To see that this value is achieved, we just let \(v\) be a vector such that \(c_m = 1\) and \(c_i = 0\) for all \(i\neq m\).
Now
\[\begin{align*} ||A||_2 &= \sup\{ ||A||_2: v\in \mathbb{R}^n, |v|_2 = 1|\}\\ &= \sup\{ \sqrt{f(v)}: v\in \mathbb{R}^n, |v|_2 = 1|\}\\ &= \sqrt{\sup\{ f(v): v\in \mathbb{R}^n, \|v\|_2 = 1|\}}\\ &= \sqrt{\rho(A^TA}) \end{align*}\]
The second to last equality relies on the fact that square root is an increasing function.
3.5 Measuring Error
Since we have notions of length and distance, we can quantify error: the discrepancy between the actual value of some data and the measured value. We can study the worst-case behavior of a system by applying a perturbation on the input and trying to get an upper bound on the resulted perturbation in the output.
The notion of continuity is in this spirit: it says that if the error \(|v-x|\) in the measured \(x\) compared to the actual \(v\) is made sufficiently small, then the error \(|f(v)-f(x)|\) in the computed values is also small.
Throughout this section, we fix a norm \(|\cdot|\) on \(V=\mathbb{R}^n\) and let \(||\cdot||\) be the induced norm on linear maps.
Definition 3.5 Let \(u, v\in V\). The absolute error between \(u\) and \(v\) is \(|u - v|\).
If \(u\) and \(v\) are very large, then \(|u-v|\) may be quite large even if the vectors don’t seem to be that far apart; imagine starting with \(u\) and \(v\) that are close together and scaling up by a large factor. This will magnify \(|u-v|\) by the same amount, but it doesn’t necessarily seem like the error has increased (though this depends on what you’re trying to do). So when the vectors are large, we might want to tolerate larger absolute to the errors. This motivates the next definition.
Definition 3.6 Let \(u, v\in\mathbb{R}^n\). The relative error of \(u\) from \(v\) is
\[\frac{|u - v|}{|v|}.\]
Unlike absolute error, relative error is not symmetric because it depends on the choice of denominator. When a choice is possible, we will let the denominator be the actual value, rather than the measured value.
Theorem 3.9 Let \(T:V\to V\) be an invertible linear map, and suppose we have \(Ax = y\) and \(A\tilde x = \tilde y\). Then, the errors obey the following:
- In terms of absolute error, \[ \frac{1}{||T^{-1}||} |\tilde x - x| \leq |\tilde y - y| \leq ||T|||\tilde x - x| \]
- In terms of relative error, \[ \frac{1}{||T^{-1}||||T||}\frac{|\tilde x - x|}{|x|} \leq \frac{|\tilde y - y|}{|y|} \leq ||T^{-1}||||T||\frac{|\tilde x - x|}{|x|}. \]
Proof. Since \((\tilde y - y) = T(\tilde x - x)\) by linearity, and also \(T^{-1}(\tilde y - y) = \tilde x - x\), we can apply submultiplicativity in two ways to get both lower and upper bound on \(\|\tilde y -y\|\): \[ \frac{1}{||T^{-1}||}|\tilde x - x| \leq |\tilde y - y| \leq ||T|||\tilde x - x|, \]
and the same fact also gives us a bound on \(|y| = |T(x)|\), which is \[ ||T||\|x\| \geq |y| \geq \frac{1}{||T^{-1}||}|x|. \]
Dividing the former chain of inequalities by the latter yields the claim.
The previous theorem shows that the quantity \(||T^{-1}||||T||\) is useful for describing how \(T\) magnifies errors.
Definition 3.7 The condition number of a linear map \(T:V\to V\) between normed vector spaces is \(||T^{-1}||||T||\), denoted \(\kappa(T)\).
Example 3.4 Let’s consider a more complicated scenario. Suppose Alice has measured a data vector \(\tilde b\) and a matrix \(\tilde A\). Both of them might contain some error. Let \(A\) and \(b\) denote the unknown true values. Now, for some reason, she needs to solve the system of equation \(Ax = b\), and she wants to get an upper bound of relative error of \(\tilde x\) from \(x\) in terms of \(\tilde A\), \(\tilde b\), and their estimated error bounds. How could she do so if she happens to know that \(\tilde b = b\) is accurate?
Well, we can calculate \[ |\tilde x - x| = |\tilde A^{-1}b - x| = |\tilde A^{-1}Ax - x| \leq |\tilde A^{-1}A - I||x| \]
Therefore
\[\begin{align*} \frac{|\tilde x - x|}{|x|} &\leq ||\tilde A^{-1}A - I||\\ &= ||\tilde A^{-1}(A - \tilde A)||\\ &\leq ||\tilde A^{-1}|| ||A - \tilde A|| \end{align*}\]
This gives us a bound on relative error.
Exercises
Exercise 3.1 The \(L_1\) norm is just the sum of absolute values of the components. Show that it is a norm. Draw a picture of the unit ball for this norm.
Exercise 3.2 Show that the \(L_\infty\) norm is, in fact, a norm. Draw a picture of the unit ball for this norm.
Exercise 3.3 Let \(V\) be a finite-dimensional vector space over \(\mathbb R\). An on \(V\) is a positive-definite and symmetric bilinear form, which is to say it is a function \(\_\cdot \_ : V\to V\) with the following properties:
- \(x\cdot x \geq 0\) and \(x\cdot 0 = 0\). If \(x\cdot x = 0\), then \(x=0\). [positive definite]
- \(x\cdot y = y\cdot x\). [symmetric]
- \((ax+by)\cdot z = a(x\cdot z) + b(y \cdot z)\). [bilinear]
Prove the Cauchy-Schwarz inequality: \[|x\cdot y| \leq |x||y|.\] Hint: Argue that you can assume \(|x|=|y|=1\). Then, for fixed \(x\) and \(y\), consider the function \[f(a) = (ax + y)\cdot (ax + y).\] It is a nonnegative (why?) a quadratic function of \(a\). Find the location of its minimum, and work out what it means for the value at the minimum to be nonnegative.
Then, prove that \(|\cdot|\) defines a norm. The triangle inequality is the tricky step. Hint: expand \(0\leq |x+y| = (x+y)\cdot (x+y)\), use the normal triangle inequality, and apply the Cauchy-Schwarz inequality somewhere.
Exercise 3.4 Verify the “reverse” triangle inequality for normed vector spaces: \[|x+y| \geq \left||x|-|y|\right|.\]
Exercise 3.5 Show that the norm map on a normed vector space is continuous.
Exercise 3.6 Show that if \(|v_n|\to 0\) as real numbers, then \(v_n \to 0\) as vectors. Further show that if \(v_n \to v\) then \(|v_n|\to |v|\).
Exercise 3.7 Show that \(|\cdot|_1\), \(|\cdot|_2\), and \(|\cdot_\infty|\) on \(\mathbb R^2\) are all equivalent. In particular, verify
\[|x|_1 \leq \sqrt 2|x|_2 \leq 2 |x|_\infty \leq 4 |x|_1.\]
Exercise 3.8 Find an example where the inequality in submultiplicativity (for norms of linear maps) is strict.
Exercise 3.9 A subset \(C\) of a \(\mathbb R\)-vector space \(V\) is called ``convex’’ if it contains all weighted averages of its elements, which is to say that for all \(x,y\in C\) and \(t\in [0,1]\), the vector \[tx + (1-t)y\] is also in \(C\).
Let \(C\) be a convex set containing \(0\) and which spans \(V\). Show that the function \(|\cdot|_C\) given by \[|v|_C = \inf \{|a| \in K : v\in a C\},\] is, in fact, a function, and that it defines a norm. The notation \(aC\) means the set of all multiples of elements in \(C\) by \(a\).
Conversely, if \(|\cdot |\) is a norm, prove that its unit ball \(B\) is convex, spans \(V\), and that \(|\cdot|_B\) coincides with \(|\cdot|\). So every (real) norm comes from a convex body.
Exercise 3.10 Suppose \(|\cdot|_C\) and \(|\cdot|_{C'}\) are norms defined from convex bodies. Prove that they are equivalent norms if and only if there are scalars \(a\) and \(b\) such that \[aC\subseteq C'\] and \[bC'\subseteq C.\]
Exercise 3.11 Consider a monic polynomial \[p(x) = a_0 + a_1x + a_2x^2 + ... + a_{n-1}x^{n-1} + x^n.\] Show that the characteristic polynomial of \[\begin{bmatrix} 0&1&0&\dots&0\\ 0&0&1&\ddots&\vdots \\ \vdots & \ddots & \ddots & \ddots & 0\\ 0&\dots&\dots &0& 1 \\ -a_0 & -a_1 & \dots & \dots & -a_{n-1} \end{bmatrix}\] is \(f(x)\). What does the Gershgorin circle theorem tell you about the roots of \(f(x)\)?
Exercise 3.12 All norms are the \(\infty\) norm in this problem.
An optimization problem gives rise to a system of linear equations \[Lx = b,\] where \(||L||=4\) and \(||L^{-1}|| = 1/2\).
You have an estimate \(\tilde b\) for \(b\), so you end up solving \[L\tilde x = \tilde b.\]
- Suppose the relative error in \(b\) is \(0.1\). Bound the relative error of \(x\).
- Suppose \(|b| = 40\) and hence the absolute error \(|b - \tilde b|\) of \(b\) is at most \(4\) (using the relative error from (a)). What is a bound for the absolute error on \(x\)?
- Continuing to assume \(|b|=40\), bound \(|x|\).